Let's look at an example using Apache PDFBox to merge one or more PDF files into a single PDF.
This will demonstrate taking two separate PDF files, reading them as binary, and using PDFBox's PDFMergerUtility
to append them into a new document object before writing the final file out. The whole process will work with the files in memory using some Java input and output streams.
The example PDFs used for merging will just be plain files with some dummy text, but any PDF will do.
This demo will set everything up to use CommandBox for the JAR dependencies and running the server instance.
// box.json
{
"name":"Merge PDFs",
"dependencies":{
"pdfbox-2.0.26":"jar:https://search.maven.org/remotecontent?filepath=org/apache/pdfbox/pdfbox/2.0.26/pdfbox-2.0.26.jar",
"fontbox-2.0.26":"jar:https://search.maven.org/remotecontent?filepath=org/apache/pdfbox/fontbox/2.0.26/fontbox-2.0.26.jar"
},
"installPaths":{
"pdfbox-2.0.26":"lib/pdfbox-2.0.26/",
"fontbox-2.0.26":"lib/fontbox-2.0.26/"
}
}
From within the project directory, running box install
will pull down the JAR files and place them in the /lib
folder.
// Application.cfc
component {
this.name = hash(getBaseTemplatePath());
this.applicationTimeout = createTimeSpan(0, 2, 0, 0);
this.javaSettings.loadPaths = directoryList(expandPath("/lib"), true, "path", "*.jar");
this.mappings = {
// Location of the PDFs to be merged
"/resources": expandPath("/resources"),
// Location of the merging function
"/components": expandPath("/components"),
// Location of the final file
"/output": expandPath("/output")
};
}
I'll break this down more below, but here's a quick rundown of what's going on...
PDDocument
as an empty container of the final file. This is used to append the PDFs.PDDocument
to a stream in memory and add it to the final file instance to be merged and written out to an actual file.// /components/MergePDF.cfc
component displayname="Merge PDF"
output=false
{
public void function mergePDF(
required array files,
required string filePath
) {
try{
var ByteArrayOutputStream = createObject("java", "java.io.ByteArrayOutputStream");
var ByteArrayInputStream = createObject("java", "java.io.ByteArrayInputStream");
var BufferedInputStream = createObject("java", "java.io.BufferedInputStream");
var PDFMergerUtility = createObject("java", "org.apache.pdfbox.multipdf.PDFMergerUtility");
var MemoryUsageSetting = createObject("java", "org.apache.pdfbox.io.MemoryUsageSetting");
var PDDocument = createObject("java", "org.apache.pdfbox.pdmodel.PDDocument");
// Resolve file generation path
var fileLocation = arguments.filePath.listToArray("\/").slice(1, -1).toList("/");
if (!directoryExists(fileLocation)) directoryCreate(fileLocation);
// Create merge util instance
var pdfMerger = PDFMergerUtility.init();
// Final output file
var finalFile = pdfMerger;
finalFile.setDestinationFileName(arguments.filePath);
// Create the main document
var mainDoc = PDDocument.init();
// Load documents and append to main document
arguments.files.each((page) => {
var document = PDDocument.load(arguments.page);
pdfMerger.appendDocument(mainDoc, document);
document.close();
});
// Save PDF to output stream and convert to byte array
var mainDocOS = ByteArrayOutputStream.init();
mainDoc.save(mainDocOS);
var mainDocBA = mainDocOS.toByteArray();
mainDoc.close();
mainDocOS.close();
// Stream to final PDF file
var mainDocIS = BufferedInputStream.init(
ByteArrayInputStream.init(mainDocBA)
);
finalFile.addSource(mainDocIS);
// Write the final PDF file
finalFile.mergeDocuments(MemoryUsageSetting.setupMainMemoryOnly());
mainDocIS?.close();
}
catch(any e) {
writeDump(e);
}
finally {
// Ensure on succes OR error, all streams are closed!
document?.close();
mainDoc?.close();
mainDocOS?.close();
mainDocIS?.close();
finalFile?.close();
}
}
}
After the Java classes are defined, an instance of the PDFMergerUtility
is created and set to a new variable. From there we can define the file destination for later.
// Create merge util instance
var pdfMerger = PDFMergerUtility.init();
// Final output file
var finalFile = pdfMerger;
finalFile.setDestinationFileName(arguments.filePath);
We then create a new PDDocument
instance for holding the PDFs as they are appended. As the collection of binary PDFs is looped over, the PDDocument
object is used to load each file as something PDFBox can use. The file is then appended to the main PDDocument
instance using the PDFMergerUtility
.
// Create the main document
var mainDoc = PDDocument.init();
// Load documents and append to main document
arguments.files.each((page) => {
var document = PDDocument.load(arguments.page);
pdfMerger.appendDocument(mainDoc, document);
document.close();
});
Once all of the PDFs have been appended, the main document object is saved to a ByteArrayOutputStream
in memory and converted to a byte array.
The byte array data is then fed to a BufferedInputStream
and added to the PDFMergerUtility
instance representing the final file.
At this point, the mergeDocuments
function is called on the PDFMergerUtility
instance, finalizing and writing the file to the designated location.
In this example, we pass an instance of
MemoryUsageSetting
to the function telling the process to store each file being merged temporarily in memory. There are a few options available for this process, including the ability to physically write each file being merged to a temp location if desired.
// Save PDF to output stream and convert to byte array
var mainDocOS = ByteArrayOutputStream.init();
mainDoc.save(mainDocOS);
var mainDocBA = mainDocOS.toByteArray();
mainDoc.close();
mainDocOS.close();
// Stream to final PDF file
var mainDocIS = BufferedInputStream.init(
ByteArrayInputStream.init(mainDocBA)
);
finalFile.addSource(mainDocIS);
// Write the final PDF file
finalFile.mergeDocuments(MemoryUsageSetting.setupMainMemoryOnly());
mainDocIS?.close();
Passing an array of PDFs as binary values to the function will result in the final PDF containing both files as one.
<!--- index.cfm --->
<cfset pdfUtil = new components.MergePDF()>
<cfset page1 = fileReadBinary("/resources/append1.pdf")>
<cfset page2 = fileReadBinary("/resources/append2.pdf")>
<cfset filePath = expandPath("/output/appendpdf.pdf")>
<cfset pdfUtil.mergePDF(files = [page1, page2], filePath = filePath)>
Cheers & Happy Coding!